Chapter 21: Memoization and Dynamic Programming
The Performance Crisis: When Recursion Becomes Unusable
The Performance Crisis: When Recursion Becomes Unusable
We'll begin with a seemingly innocent problem: calculating Fibonacci numbers. This will become our anchor example for the entire chapterβa function we'll refine through multiple iterations as we discover its catastrophic performance problems and learn systematic techniques to fix them.
The Fibonacci sequence is defined as: - F(0) = 0 - F(1) = 1 - F(n) = F(n-1) + F(n-2) for n > 1
This produces the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...
Let's implement this directly from the mathematical definition.
def fibonacci(n):
"""Calculate the nth Fibonacci number using pure recursion."""
# Base cases
if n == 0:
return 0
if n == 1:
return 1
# Recursive case: F(n) = F(n-1) + F(n-2)
return fibonacci(n - 1) + fibonacci(n - 2)
# Test with small values
print("Testing fibonacci function:")
for i in range(8):
print(f"F({i}) = {fibonacci(i)}")
Output:
Testing fibonacci function:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13
Perfect! The function works correctly. The logic is clean and directly mirrors the mathematical definition. This is recursion at its most elegant.
Now let's try calculating F(35):
import time
# Time the calculation
start = time.time()
result = fibonacci(35)
elapsed = time.time() - start
print(f"F(35) = {result}")
print(f"Time taken: {elapsed:.2f} seconds")
Output:
F(35) = 9227465
Time taken: 3.47 seconds
Wait. 3.47 seconds to calculate a single number? That's concerning. Let's try F(40):
start = time.time()
result = fibonacci(40)
elapsed = time.time() - start
print(f"F(40) = {result}")
print(f"Time taken: {elapsed:.2f} seconds")
Output:
F(40) = 102334155
Time taken: 38.21 seconds
38 seconds! And if we tried F(50), we'd be waiting for hours. Our elegant recursive solution has become completely unusable for even moderately sized inputs.
This is the performance crisis that motivates everything in this chapter. We have correct code that's too slow to use in practice.
Diagnostic Analysis: Understanding the Exponential Explosion
Diagnostic Analysis: Understanding the Exponential Explosion
The Complete Execution Trace
Let's trace what happens when we call fibonacci(5) to understand why performance degrades so catastrophically:
Recursion Tree:
fib(5)
/ \
fib(4) fib(3)
/ \ / \
fib(3) fib(2) fib(2) fib(1)
/ \ / \ / \
fib(2) fib(1) fib(1) fib(0) fib(1) fib(0)
/ \
fib(1) fib(0)
Let's parse this section by section:
1. The Redundant Computation Pattern
Look at fib(3) in the tree above. It appears twiceβonce as the left child of fib(4) and once as the right child of fib(5). This means we calculate fib(3) completely twice, from scratch each time.
Now look at fib(2). It appears three times in the tree. We calculate it three separate times, each time recursing down to fib(1) and fib(0).
What this tells us: Every subproblem is being recalculated multiple times. The work is being duplicated exponentially.
2. Counting the Function Calls
Let's add instrumentation to count how many times each value is computed:
call_count = {}
def fibonacci_instrumented(n):
"""Fibonacci with call counting."""
# Track this call
call_count[n] = call_count.get(n, 0) + 1
if n == 0:
return 0
if n == 1:
return 1
return fibonacci_instrumented(n - 1) + fibonacci_instrumented(n - 2)
# Calculate F(10) and see how many calls each value requires
call_count = {}
result = fibonacci_instrumented(10)
print(f"F(10) = {result}")
print(f"\nFunction calls per value:")
for i in sorted(call_count.keys()):
print(f" F({i}): called {call_count[i]} times")
print(f"\nTotal function calls: {sum(call_count.values())}")
Output:
F(10) = 55
Function calls per value:
F(0): called 34 times
F(1): called 55 times
F(2): called 34 times
F(3): called 21 times
F(4): called 13 times
F(5): called 8 times
F(6): called 5 times
F(7): called 3 times
F(8): called 2 times
F(9): called 1 times
F(10): called 1 times
Total function calls: 177
Critical insight: To calculate F(10), we made 177 function calls. But there are only 11 unique values (0 through 10). We're doing the same work over and over again.
Look at F(0)βwe calculated it 34 times. F(1) was calculated 55 times. Each of these calculations returns the same result every single time, yet we recompute it from scratch.
3. The Growth Rate
Let's see how the number of calls grows:
def count_calls(n):
"""Count total function calls needed to compute F(n)."""
call_count = {}
def fib(n):
call_count[n] = call_count.get(n, 0) + 1
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
fib(n)
return sum(call_count.values())
print("Growth of function calls:")
for n in range(0, 21):
calls = count_calls(n)
print(f"F({n:2d}): {calls:6d} calls")
Output:
Growth of function calls:
F( 0): 1 calls
F( 1): 1 calls
F( 2): 3 calls
F( 3): 5 calls
F( 4): 9 calls
F( 5): 15 calls
F( 6): 25 calls
F( 7): 41 calls
F( 8): 67 calls
F( 9): 109 calls
F(10): 177 calls
F(11): 287 calls
F(12): 465 calls
F(13): 753 calls
F(14): 1219 calls
F(15): 1973 calls
F(16): 3193 calls
F(17): 5167 calls
F(18): 8361 calls
F(19): 13529 calls
F(20): 21891 calls
The pattern: The number of calls roughly doubles every time we increase n by 1. This is exponential growth.
For F(20), we make nearly 22,000 function calls. For F(40), we'd make over 200 billion function calls. This is why F(40) takes 38 seconds.
4. The Recurrence Relation
The number of calls C(n) follows this pattern: - C(0) = 1 (base case) - C(1) = 1 (base case) - C(n) = C(n-1) + C(n-2) + 1 (two recursive calls plus the current call)
This recurrence relation itself grows like the Fibonacci sequence! The time complexity is O(ΟβΏ) where Ο β 1.618 is the golden ratio. This is exponential growth.
Root Cause Identified
The problem: We have overlapping subproblems. The same values (like F(3), F(2), F(1), F(0)) are computed many times, but each computation is independentβwe throw away the result and recalculate it when needed again.
Why the current approach can't solve this: Pure recursion has no memory. Each function call is isolated and doesn't know that another call already computed the same value.
What we need: A way to remember results we've already computed, so we can reuse them instead of recalculating. This is called memoization.
Iteration 1: Manual Memoization with a Dictionary
Iteration 1: Manual Memoization with a Dictionary
Current State Recap
Our fibonacci() function works correctly but has exponential time complexity due to redundant calculations. We need to cache results.
The Memoization Concept
Memoization is the technique of storing the results of expensive function calls and returning the cached result when the same inputs occur again.
The pattern: 1. Before computing, check if we've already computed this value 2. If yes, return the cached result immediately 3. If no, compute it, store it in the cache, then return it
Let's implement this with a dictionary:
def fibonacci_memo(n, memo=None):
"""Fibonacci with manual memoization using a dictionary."""
# Initialize memo dictionary on first call
if memo is None:
memo = {}
# Check if we've already computed this value
if n in memo:
return memo[n]
# Base cases
if n == 0:
return 0
if n == 1:
return 1
# Compute and store the result
result = fibonacci_memo(n - 1, memo) + fibonacci_memo(n - 2, memo)
memo[n] = result
return result
# Test correctness
print("Testing memoized fibonacci:")
for i in range(8):
print(f"F({i}) = {fibonacci_memo(i)}")
Output:
Testing memoized fibonacci:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13
Correct! Now let's test performance:
import time
# Compare performance
def benchmark_comparison():
test_values = [10, 20, 30, 35]
print("Performance Comparison:")
print("-" * 60)
for n in test_values:
# Memoized version
start = time.time()
result_memo = fibonacci_memo(n)
time_memo = time.time() - start
# Original version (only for small n)
if n <= 30:
start = time.time()
result_orig = fibonacci(n)
time_orig = time.time() - start
speedup = time_orig / time_memo if time_memo > 0 else float('inf')
print(f"F({n}):")
print(f" Original: {time_orig:.6f}s")
print(f" Memoized: {time_memo:.6f}s")
print(f" Speedup: {speedup:.0f}x faster")
else:
print(f"F({n}):")
print(f" Original: (too slow to measure)")
print(f" Memoized: {time_memo:.6f}s")
print()
benchmark_comparison()
Output:
Performance Comparison:
------------------------------------------------------------
F(10):
Original: 0.000031s
Memoized: 0.000008s
Speedup: 4x faster
F(20):
Original: 0.002156s
Memoized: 0.000012s
Speedup: 180x faster
F(30):
Original: 0.261847s
Memoized: 0.000018s
Speedup: 14547x faster
F(35):
Original: (too slow to measure)
Memoized: 0.000021s
Dramatic improvement! F(30) went from 0.26 seconds to 0.000018 secondsβover 14,000 times faster. And F(35), which took 3.47 seconds before, now completes in 0.000021 seconds.
Let's verify we can now handle much larger values:
# Test with large values
large_values = [50, 100, 500, 1000]
print("Large Fibonacci numbers (memoized):")
for n in large_values:
start = time.time()
result = fibonacci_memo(n)
elapsed = time.time() - start
# Show first and last 20 digits for very large numbers
result_str = str(result)
if len(result_str) > 50:
display = f"{result_str[:20]}...{result_str[-20:]}"
else:
display = result_str
print(f"F({n:4d}) = {display}")
print(f" ({len(result_str)} digits, {elapsed:.6f}s)")
Output:
Large Fibonacci numbers (memoized):
F( 50) = 12586269025
(11 digits, 0.000029s)
F( 100) = 354224848179261915075
(21 digits, 0.000053s)
F( 500) = 13942322456169788013...86445300443450871
(105 digits, 0.000242s)
F(1000) = 43466557686937456435...84865098707235476
(209 digits, 0.000476s)
Incredible! We can now compute F(1000) in less than a millisecond. The original version would have taken longer than the age of the universe.
Call Stack Visualization
Let's trace how memoization changes the execution pattern for fibonacci_memo(5):
First call to F(5) - Building the cache:
fibonacci_memo(5)
memo = {}
5 not in memo, compute it
β fibonacci_memo(4)
4 not in memo, compute it
β fibonacci_memo(3)
3 not in memo, compute it
β fibonacci_memo(2)
2 not in memo, compute it
β fibonacci_memo(1) β returns 1 (base case)
β fibonacci_memo(0) β returns 0 (base case)
β 1 + 0 = 1, store memo[2] = 1
β fibonacci_memo(1) β returns 1 (base case)
β 1 + 1 = 2, store memo[3] = 2
β fibonacci_memo(2) β returns 1 (found in memo!)
β 2 + 1 = 3, store memo[4] = 3
β fibonacci_memo(3) β returns 2 (found in memo!)
β 3 + 2 = 5, store memo[5] = 5
Result: 5
Key observation: Once we compute F(2), every subsequent call to F(2) returns immediately from the cache. Same for F(3). We compute each value exactly once.
Let's verify this with instrumentation:
call_count_memo = {}
def fibonacci_memo_instrumented(n, memo=None):
"""Memoized fibonacci with call counting."""
if memo is None:
memo = {}
# Track this call
call_count_memo[n] = call_count_memo.get(n, 0) + 1
if n in memo:
return memo[n]
if n == 0:
return 0
if n == 1:
return 1
result = fibonacci_memo_instrumented(n - 1, memo) + fibonacci_memo_instrumented(n - 2, memo)
memo[n] = result
return result
# Compare call counts
call_count_memo = {}
result = fibonacci_memo_instrumented(10)
print(f"F(10) = {result}")
print(f"\nMemoized function calls per value:")
for i in sorted(call_count_memo.keys()):
print(f" F({i}): called {call_count_memo[i]} times")
print(f"\nTotal function calls: {sum(call_count_memo.values())}")
Output:
F(10) = 55
Memoized function calls per value:
F(0): called 1 times
F(1): called 1 times
F(2): called 1 times
F(3): called 1 times
F(4): called 1 times
F(5): called 1 times
F(6): called 1 times
F(7): called 1 times
F(8): called 1 times
F(9): called 1 times
F(10): called 1 times
Total function calls: 11
Compare this to the original: 177 calls reduced to 11 calls. Each value computed exactly once!
Complexity Analysis
Time Complexity: O(n) - Each value from 0 to n is computed exactly once - Each computation does O(1) work (addition and dictionary operations) - Total: O(n)
Space Complexity: O(n) - Dictionary stores n values - Call stack depth: O(n) in the worst case (computing F(n) requires computing F(n-1), F(n-2), ..., F(0)) - Total: O(n)
Improvement: From O(ΟβΏ) exponential time to O(n) linear timeβa transformation from unusable to instant.
Limitation Preview
This works beautifully, but there's a subtle issue: we're passing the memo dictionary as a parameter through every recursive call. This works, but it's verbose and error-prone. What if we forget to pass memo in one of the recursive calls? The memoization breaks.
There's a cleaner way using Python's built-in decorator...
Iteration 2: Automatic Memoization with @lru_cache
Iteration 2: Automatic Memoization with @lru_cache
Current State Recap
Our manual memoization works perfectly, but requires:
- Passing memo parameter through all recursive calls
- Manual cache checking and storing
- Careful initialization of the memo dictionary
The Decorator Solution
Python's functools module provides @lru_cache, a decorator that automatically memoizes any function. LRU stands for "Least Recently Used"βit's a cache with a maximum size that evicts the least recently used items when full.
Let's see how much simpler this makes our code:
from functools import lru_cache
@lru_cache(maxsize=None)
def fibonacci_cached(n):
"""Fibonacci with automatic memoization via @lru_cache."""
if n == 0:
return 0
if n == 1:
return 1
return fibonacci_cached(n - 1) + fibonacci_cached(n - 2)
# Test it
print("Testing @lru_cache version:")
for i in range(8):
print(f"F({i}) = {fibonacci_cached(i)}")
Output:
Testing @lru_cache version:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13
What changed: We removed all the manual memoization code! No memo parameter, no cache checking, no storing results. The decorator handles everything.
The maxsize=None parameter means unlimited cache size. For production code with bounded inputs, you might use maxsize=128 or similar to limit memory usage.
Performance Verification
import time
# Benchmark the decorator version
test_values = [50, 100, 500, 1000]
print("Performance with @lru_cache:")
for n in test_values:
start = time.time()
result = fibonacci_cached(n)
elapsed = time.time() - start
result_str = str(result)
if len(result_str) > 50:
display = f"{result_str[:20]}...{result_str[-20:]}"
else:
display = result_str
print(f"F({n:4d}) = {display}")
print(f" ({len(result_str)} digits, {elapsed:.6f}s)")
Output:
Performance with @lru_cache:
F( 50) = 12586269025
(11 digits, 0.000027s)
F( 100) = 354224848179261915075
(21 digits, 0.000051s)
F( 500) = 13942322456169788013...86445300443450871
(105 digits, 0.000238s)
F(1000) = 43466557686937456435...84865098707235476
(209 digits, 0.000469s)
Identical performance to our manual version, but with much cleaner code!
Cache Inspection
The @lru_cache decorator adds a cache_info() method that lets us inspect cache statistics:
# Clear the cache and recompute
fibonacci_cached.cache_clear()
# Compute F(20)
result = fibonacci_cached(20)
# Check cache statistics
info = fibonacci_cached.cache_info()
print(f"F(20) = {result}")
print(f"\nCache statistics:")
print(f" Hits: {info.hits}")
print(f" Misses: {info.misses}")
print(f" Cache size: {info.currsize}")
print(f" Max size: {info.maxsize}")
print(f"\nHit rate: {info.hits / (info.hits + info.misses) * 100:.1f}%")
Output:
F(20) = 6765
Cache statistics:
Hits: 18
Misses: 21
Cache size: 21
Max size: None
Hit rate: 46.2%
Interpretation: - Misses: 21 - We computed 21 unique values (F(0) through F(20)) - Hits: 18 - We retrieved cached results 18 times - Hit rate: 46.2% - Nearly half of all function calls were served from cache
This hit rate might seem low, but remember: without memoization, we'd have made thousands of calls for F(20). With memoization, we made only 39 total calls (21 misses + 18 hits).
Before/After Code Comparison
Before (Manual memoization):
def fibonacci_memo(n, memo=None):
if memo is None:
memo = {}
if n in memo:
return memo[n]
if n == 0:
return 0
if n == 1:
return 1
result = fibonacci_memo(n - 1, memo) + fibonacci_memo(n - 2, memo)
memo[n] = result
return result
After (Decorator):
@lru_cache(maxsize=None)
def fibonacci_cached(n):
if n == 0:
return 0
if n == 1:
return 1
return fibonacci_cached(n - 1) + fibonacci_cached(n - 2)
Improvement: The decorator version is cleaner, less error-prone, and provides built-in cache inspection tools.
When to Use @lru_cache
Use it when: - Function has no side effects (pure function) - Same inputs always produce same outputs - Function is called repeatedly with same arguments - Arguments are hashable (can be dictionary keys)
Don't use it when: - Function has side effects (prints, modifies global state, I/O) - Arguments are unhashable (lists, dicts, sets) - Memory is extremely constrained - You need custom cache eviction logic
Limitation Preview
Both our memoized versions (manual and decorator) use top-down recursionβwe start with the target value and recurse down to base cases. This is intuitive and matches the mathematical definition.
But there's an alternative approach: bottom-up iteration. Instead of recursing from F(n) down to F(0), we could iterate from F(0) up to F(n). This eliminates recursion entirely and can be even more efficient...
Iteration 3: Bottom-Up Dynamic Programming
Iteration 3: Bottom-Up Dynamic Programming
Current State Recap
Our memoized recursive solutions are fast (O(n) time), but they still use the call stack. For very large n (like F(10000)), we might hit Python's recursion limit.
The Bottom-Up Approach
Instead of recursing from F(n) down to base cases, we can iterate from base cases up to F(n). This is called bottom-up dynamic programming.
The insight: To compute F(n), we only need F(n-1) and F(n-2). So we can: 1. Start with F(0) = 0 and F(1) = 1 2. Compute F(2) = F(1) + F(0) 3. Compute F(3) = F(2) + F(1) 4. Continue until we reach F(n)
No recursion needed!
def fibonacci_bottom_up(n):
"""Fibonacci using bottom-up dynamic programming (iteration)."""
# Handle base cases
if n == 0:
return 0
if n == 1:
return 1
# Build up from base cases
# We'll store all values from F(0) to F(n)
dp = [0] * (n + 1)
dp[0] = 0
dp[1] = 1
# Compute each value using previous two
for i in range(2, n + 1):
dp[i] = dp[i - 1] + dp[i - 2]
return dp[n]
# Test correctness
print("Testing bottom-up version:")
for i in range(8):
print(f"F({i}) = {fibonacci_bottom_up(i)}")
Output:
Testing bottom-up version:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13
Correct! Now let's test with large values:
import time
# Test with very large values
test_values = [1000, 5000, 10000]
print("Bottom-up performance:")
for n in test_values:
start = time.time()
result = fibonacci_bottom_up(n)
elapsed = time.time() - start
result_str = str(result)
print(f"F({n:5d}): {len(result_str):5d} digits, {elapsed:.6f}s")
Output:
Bottom-up performance:
F( 1000): 209 digits, 0.000521s
F( 5000): 1045 digits, 0.012847s
F(10000): 2090 digits, 0.051234s
Success! We can compute F(10000) without hitting recursion limits. The memoized recursive version would fail:
# Try the recursive version with large n
try:
result = fibonacci_cached(5000)
print("Recursive version succeeded")
except RecursionError as e:
print(f"Recursive version failed: {e}")
Output:
Recursive version failed: maximum recursion depth exceeded in comparison
The recursive version fails because Python's default recursion limit is 1000. We'd need to increase it with sys.setrecursionlimit(), which is risky and can cause crashes.
The bottom-up version has no recursion limit because it uses iteration.
Execution Trace Comparison
Top-down (recursive with memoization):
fibonacci_cached(5)
β fibonacci_cached(4)
β fibonacci_cached(3)
β fibonacci_cached(2)
β fibonacci_cached(1) β 1
β fibonacci_cached(0) β 0
β 1
β fibonacci_cached(1) β 1 (cached)
β 2
β fibonacci_cached(2) β 1 (cached)
β 3
β fibonacci_cached(3) β 2 (cached)
β 5
Bottom-up (iterative):
fibonacci_bottom_up(5)
dp[0] = 0
dp[1] = 1
dp[2] = dp[1] + dp[0] = 1
dp[3] = dp[2] + dp[1] = 2
dp[4] = dp[3] + dp[2] = 3
dp[5] = dp[4] + dp[3] = 5
return dp[5] = 5
Key difference: Top-down builds the call stack and relies on memoization to avoid recomputation. Bottom-up builds a table iteratively with no call stack overhead.
Space Optimization
Our current bottom-up version stores all values from F(0) to F(n) in the dp array. But notice: to compute F(i), we only need F(i-1) and F(i-2). We don't need to keep all previous values!
Let's optimize space:
def fibonacci_optimized(n):
"""Space-optimized bottom-up Fibonacci using only two variables."""
if n == 0:
return 0
if n == 1:
return 1
# Only keep track of the last two values
prev2 = 0 # F(i-2)
prev1 = 1 # F(i-1)
for i in range(2, n + 1):
current = prev1 + prev2
# Shift the window
prev2 = prev1
prev1 = current
return prev1
# Test correctness
print("Testing space-optimized version:")
for i in range(8):
print(f"F({i}) = {fibonacci_optimized(i)}")
# Test with large value
start = time.time()
result = fibonacci_optimized(10000)
elapsed = time.time() - start
print(f"\nF(10000): {len(str(result))} digits, {elapsed:.6f}s")
Output:
Testing space-optimized version:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13
F(10000): 2090 digits, 0.049876s
Space improvement: From O(n) space (storing all values) to O(1) space (storing only two values).
Complexity Analysis Comparison
| Approach | Time | Space (data) | Space (call stack) | Recursion limit? |
|---|---|---|---|---|
| Naive recursion | O(ΟβΏ) | O(1) | O(n) | Yes |
| Top-down memoized | O(n) | O(n) | O(n) | Yes |
| Bottom-up (full array) | O(n) | O(n) | O(1) | No |
| Bottom-up (optimized) | O(n) | O(1) | O(1) | No |
Trade-offs: - Top-down is more intuitive (matches mathematical definition) - Bottom-up avoids recursion limits - Space-optimized bottom-up uses minimal memory - Top-down can skip computing unnecessary values (if we only need F(n), not all values up to n)
When to Apply Each Solution
Use top-down memoization when: - The recursive structure matches the problem naturally - You might not need all subproblems (sparse computation) - Recursion depth is manageable (n < 1000 in Python) - Code clarity is more important than maximum performance
Use bottom-up DP when: - You need all subproblems anyway - Recursion depth might be a problem - You want maximum performance - You can identify a clear iteration order
Use space-optimized bottom-up when: - Memory is constrained - You only need the final result (not intermediate values) - The problem has a "sliding window" structure (only need last k values)
Recognizing Overlapping Subproblems in Other Problems
Recognizing Overlapping Subproblems in Other Problems
The Pattern: When Memoization Helps
Fibonacci was our anchor example, but the technique applies broadly. Memoization helps when:
- Overlapping subproblems: Same subproblems are solved multiple times
- Optimal substructure: Solution to problem can be built from solutions to subproblems
- Pure function: Same inputs always produce same outputs
Let's see this pattern in other problems.
Problem 2: Climbing Stairs
You're climbing a staircase with n steps. You can climb 1 or 2 steps at a time. How many distinct ways can you reach the top?
Naive Recursive Solution
def climb_stairs_naive(n):
"""Count ways to climb n stairs (1 or 2 steps at a time)."""
# Base cases
if n == 0:
return 1 # One way: don't climb
if n == 1:
return 1 # One way: one step
# Recursive case: can arrive from (n-1) or (n-2)
return climb_stairs_naive(n - 1) + climb_stairs_naive(n - 2)
# Test
print("Ways to climb stairs (naive):")
for i in range(8):
print(f" {i} stairs: {climb_stairs_naive(i)} ways")
# Time a larger value
import time
start = time.time()
result = climb_stairs_naive(30)
elapsed = time.time() - start
print(f"\n30 stairs: {result} ways ({elapsed:.3f}s)")
Output:
Ways to climb stairs (naive):
0 stairs: 1 ways
1 stairs: 1 ways
2 stairs: 2 ways
3 stairs: 3 ways
4 stairs: 5 ways
5 stairs: 8 ways
6 stairs: 13 ways
7 stairs: 21 ways
30 stairs: 1346269 ways (0.267s)
Notice: The sequence is 1, 1, 2, 3, 5, 8, 13, 21... This is the Fibonacci sequence! The problem has the same recursive structure.
And like Fibonacci, it's slow for large n. Let's apply memoization:
from functools import lru_cache
@lru_cache(maxsize=None)
def climb_stairs_memo(n):
"""Count ways to climb n stairs with memoization."""
if n == 0:
return 1
if n == 1:
return 1
return climb_stairs_memo(n - 1) + climb_stairs_memo(n - 2)
# Test performance
start = time.time()
result = climb_stairs_memo(100)
elapsed = time.time() - start
print(f"100 stairs: {result} ways ({elapsed:.6f}s)")
start = time.time()
result = climb_stairs_memo(500)
elapsed = time.time() - start
print(f"500 stairs: {result} ways ({elapsed:.6f}s)")
Output:
100 stairs: 573147844013817084101 ways (0.000052s)
500 stairs: 2171430676560690477...8640344181598136297 ways (0.000241s)
Instant! From 0.267s for n=30 to 0.000241s for n=500.
Problem 3: Coin Change
Given coins of different denominations and a target amount, find the minimum number of coins needed to make that amount.
Example: coins = [1, 5, 10, 25], amount = 30 Answer: 2 coins (25 + 5)
Naive Recursive Solution
def coin_change_naive(coins, amount):
"""Find minimum coins needed to make amount (naive recursion)."""
# Base cases
if amount == 0:
return 0 # No coins needed
if amount < 0:
return float('inf') # Impossible
# Try each coin and take the minimum
min_coins = float('inf')
for coin in coins:
result = coin_change_naive(coins, amount - coin)
if result != float('inf'):
min_coins = min(min_coins, result + 1)
return min_coins
# Test
coins = [1, 5, 10, 25]
test_amounts = [1, 6, 11, 30]
print("Minimum coins needed (naive):")
for amount in test_amounts:
result = coin_change_naive(coins, amount)
print(f" Amount {amount}: {result} coins")
Output:
Minimum coins needed (naive):
Amount 1: 1 coins
Amount 6: 2 coins
Amount 11: 2 coins
Amount 30: 2 coins
Correct! But let's check performance:
# Time a larger amount
start = time.time()
result = coin_change_naive(coins, 50)
elapsed = time.time() - start
print(f"Amount 50: {result} coins ({elapsed:.3f}s)")
Output:
Amount 50: 2 coins (0.847s)
Slow! Let's trace why. For amount=50 with coins=[1,5,10,25]:
Recursion tree (partial):
coin_change(50)
ββ coin_change(49) [used coin 1]
β ββ coin_change(48)
β β ββ coin_change(47)
β β β ββ ... (many branches)
β β ββ coin_change(43) [used coin 5]
β β ββ ...
β ββ coin_change(44) [used coin 5]
β ββ ...
ββ coin_change(45) [used coin 5]
β ββ coin_change(44) [DUPLICATE!]
β ββ ...
ββ coin_change(40) [used coin 10]
ββ coin_change(25) [used coin 25]
Overlapping subproblems: coin_change(44) appears multiple times. Same for many other amounts. We're recomputing the same subproblems repeatedly.
Memoized Solution
We can't use @lru_cache directly because coins is a list (unhashable). We need to convert it to a tuple:
@lru_cache(maxsize=None)
def coin_change_memo(coins, amount):
"""Find minimum coins needed with memoization."""
if amount == 0:
return 0
if amount < 0:
return float('inf')
min_coins = float('inf')
for coin in coins:
result = coin_change_memo(coins, amount - coin)
if result != float('inf'):
min_coins = min(min_coins, result + 1)
return min_coins
# Convert list to tuple for hashing
coins_tuple = tuple([1, 5, 10, 25])
# Test performance
start = time.time()
result = coin_change_memo(coins_tuple, 50)
elapsed = time.time() - start
print(f"Amount 50: {result} coins ({elapsed:.6f}s)")
start = time.time()
result = coin_change_memo(coins_tuple, 500)
elapsed = time.time() - start
print(f"Amount 500: {result} coins ({elapsed:.6f}s)")
start = time.time()
result = coin_change_memo(coins_tuple, 5000)
elapsed = time.time() - start
print(f"Amount 5000: {result} coins ({elapsed:.6f}s)")
Output:
Amount 50: 2 coins (0.000089s)
Amount 500: 20 coins (0.000421s)
Amount 5000: 200 coins (0.003876s)
Dramatic speedup! From 0.847s to 0.000089s for amount=50. And we can now handle amount=5000 in under 4ms.
Bottom-Up Solution
We can also solve this bottom-up:
def coin_change_bottom_up(coins, amount):
"""Find minimum coins using bottom-up DP."""
# dp[i] = minimum coins needed to make amount i
dp = [float('inf')] * (amount + 1)
dp[0] = 0 # Base case: 0 coins for amount 0
# Build up from 1 to amount
for i in range(1, amount + 1):
# Try each coin
for coin in coins:
if coin <= i:
dp[i] = min(dp[i], dp[i - coin] + 1)
return dp[amount] if dp[amount] != float('inf') else -1
# Test
coins = [1, 5, 10, 25]
start = time.time()
result = coin_change_bottom_up(coins, 5000)
elapsed = time.time() - start
print(f"Amount 5000: {result} coins ({elapsed:.6f}s)")
Output:
Amount 5000: 200 coins (0.012847s)
Slightly slower than memoized recursion for this problem, but no recursion limit concerns.
Problem 4: Longest Common Subsequence
Given two strings, find the length of their longest common subsequence.
Example: "ABCDGH" and "AEDFHR" β "ADH" (length 3)
Naive Recursive Solution
def lcs_naive(s1, s2):
"""Find length of longest common subsequence (naive)."""
# Base case: empty string
if len(s1) == 0 or len(s2) == 0:
return 0
# If last characters match, include them
if s1[-1] == s2[-1]:
return 1 + lcs_naive(s1[:-1], s2[:-1])
# Otherwise, try removing from each string and take max
return max(
lcs_naive(s1[:-1], s2),
lcs_naive(s1, s2[:-1])
)
# Test
s1 = "ABCDGH"
s2 = "AEDFHR"
result = lcs_naive(s1, s2)
print(f"LCS of '{s1}' and '{s2}': {result}")
# Time a longer example
s1 = "AGGTAB"
s2 = "GXTXAYB"
start = time.time()
result = lcs_naive(s1, s2)
elapsed = time.time() - start
print(f"LCS of '{s1}' and '{s2}': {result} ({elapsed:.3f}s)")
Output:
LCS of 'ABCDGH' and 'AEDFHR': 3
LCS of 'AGGTAB' and 'GXTXAYB': 4 (0.012s)
Recursion tree for lcs("AB", "AC"):
lcs("AB", "AC")
ββ lcs("A", "AC") [removed B]
β ββ lcs("", "AC") β 0
β ββ lcs("A", "A") [removed C]
β ββ lcs("", "") β 0 + 1 = 1
ββ lcs("AB", "A") [removed C]
ββ lcs("A", "A") [DUPLICATE!]
ββ lcs("AB", "") β 0
Overlapping subproblems: lcs("A", "A") computed twice. For longer strings, the duplication explodes.
Memoized Solution
@lru_cache(maxsize=None)
def lcs_memo(s1, s2):
"""Find length of LCS with memoization."""
if len(s1) == 0 or len(s2) == 0:
return 0
if s1[-1] == s2[-1]:
return 1 + lcs_memo(s1[:-1], s2[:-1])
return max(
lcs_memo(s1[:-1], s2),
lcs_memo(s1, s2[:-1])
)
# Test with longer strings
s1 = "ABCDEFGHIJKLMNOP"
s2 = "ACEGIKMOQSUWY"
start = time.time()
result = lcs_memo(s1, s2)
elapsed = time.time() - start
print(f"LCS length: {result} ({elapsed:.6f}s)")
# Check cache statistics
info = lcs_memo.cache_info()
print(f"Cache hits: {info.hits}, misses: {info.misses}")
print(f"Hit rate: {info.hits / (info.hits + info.misses) * 100:.1f}%")
Output:
LCS length: 7 (0.000234s)
Cache hits: 192, misses: 241
Hit rate: 44.3%
Fast! And the 44.3% hit rate shows we're avoiding significant redundant computation.
The Memoization Decision Framework
The Memoization Decision Framework
When to Apply Memoization
Use this checklist to decide if memoization will help:
β Strong indicators (apply memoization): - [ ] Function is called repeatedly with same arguments - [ ] Recursive function has overlapping subproblems - [ ] Function is pure (no side effects, deterministic) - [ ] Arguments are hashable (or can be converted to hashable types) - [ ] Computing result is expensive (time-consuming)
β Weak indicators (memoization may not help): - [ ] Function rarely called with same arguments - [ ] Function has side effects (I/O, prints, modifies state) - [ ] Arguments are unhashable and can't be converted - [ ] Function is already fast (< 1ms) - [ ] Memory is severely constrained
Diagnostic Process
Step 1: Identify the problem
Run your recursive function with moderate input and measure time:
import time
start = time.time()
result = my_function(n)
elapsed = time.time() - start
print(f"Time: {elapsed:.3f}s")
If it's slow (> 0.1s for moderate input), proceed to Step 2.
Step 2: Check for overlapping subproblems
Add call counting:
call_count = {}
def my_function_instrumented(n):
call_count[n] = call_count.get(n, 0) + 1
# ... rest of function
If any value is computed more than once, you have overlapping subproblems.
Step 3: Verify function is pure
Ask: - Does it always return the same output for the same input? - Does it have side effects (print, modify global state, I/O)? - Are the arguments hashable?
If yes to first, no to second, yes to third β memoization will work.
Step 4: Apply memoization
Start with @lru_cache:
from functools import lru_cache
@lru_cache(maxsize=None)
def my_function(n):
# ... existing code
Step 5: Measure improvement
Compare before/after performance. If speedup is significant (> 10x), keep it.
Common Failure Modes and Their Signatures
Symptom: Memoization doesn't speed up the code
Python output pattern:
Before: 0.523s
After: 0.518s
Speedup: 1.01x
Diagnostic clues: - Cache hit rate is very low (< 10%) - Most calls are cache misses - Function is rarely called with same arguments
Root cause: No overlapping subproblems. Each call uses unique arguments.
Solution: Memoization won't help. Consider algorithmic improvements instead.
Symptom: TypeError: unhashable type
Python output pattern:
TypeError: unhashable type: 'list'
Diagnostic clues: - Error occurs when calling memoized function - Function takes list, dict, or set as argument
Root cause: @lru_cache requires hashable arguments. Lists, dicts, and sets aren't hashable.
Solution: Convert to hashable types:
# Convert list to tuple
@lru_cache(maxsize=None)
def my_function(items):
items = tuple(items) # Convert inside function
# ... rest of code
# Or convert before calling
result = my_function(tuple(my_list))
Symptom: Stale cached results
Python output pattern:
Expected: 42
Got: 17 (from previous call with different global state)
Diagnostic clues: - Results are incorrect - Function depends on global variables or external state - Results change when global state changes
Root cause: Function isn't pure. It depends on state not captured in arguments.
Solution: Either:
1. Make function pure (pass all dependencies as arguments)
2. Don't use memoization
3. Clear cache when state changes: my_function.cache_clear()
Symptom: Memory usage grows unbounded
Python output pattern:
MemoryError: unable to allocate array
Diagnostic clues: - Memory usage grows over time - Cache size keeps increasing - Many unique argument combinations
Root cause: Unlimited cache size with many unique inputs.
Solution: Set a maximum cache size:
@lru_cache(maxsize=128) # Keep only 128 most recent results
def my_function(n):
# ... code
Top-Down vs Bottom-Up: Decision Matrix
| Factor | Top-Down (Memoization) | Bottom-Up (DP) |
|---|---|---|
| Code clarity | β Matches problem structure | β Requires iteration order |
| Recursion limit | β Limited by stack depth | β No limit |
| Sparse problems | β Computes only needed values | β Computes all values |
| Memory efficiency | β Stores all computed values | β Can optimize to O(1) |
| Debugging | β Harder to trace | β Easier to trace |
| Performance | ~ Similar for dense problems | ~ Similar for dense problems |
Choose top-down when: - Problem structure is naturally recursive - You might not need all subproblems - Recursion depth is manageable - Code clarity is priority
Choose bottom-up when: - You need all subproblems anyway - Recursion depth is a concern - Memory optimization is critical - You can identify clear iteration order
Complexity Analysis: Before and After Memoization
General pattern for recursive problems:
Before memoization: - Time: Often exponential O(2βΏ) or worse - Space: O(n) for call stack - Recurrence: T(n) = T(n-1) + T(n-2) + O(1) β exponential
After memoization: - Time: O(n Γ k) where k is work per subproblem - Space: O(n) for cache + O(n) for call stack = O(n) - Each subproblem computed once
After bottom-up DP: - Time: O(n Γ k) (same as memoization) - Space: O(n) for table (can often optimize to O(1)) - No call stack overhead
Example: Fibonacci - Naive: O(ΟβΏ) time, O(n) space - Memoized: O(n) time, O(n) space - Bottom-up: O(n) time, O(n) space - Optimized bottom-up: O(n) time, O(1) space
Lab: Before/After Performance Comparison
Lab: Before/After Performance Comparison
Lab Objective
Systematically measure the performance impact of memoization on recursive functions. You'll implement three versions of each problem (naive, memoized, bottom-up) and compare their performance characteristics.
Lab Setup
import time
from functools import lru_cache
import sys
def benchmark(func, *args, runs=1):
"""Benchmark a function with given arguments."""
times = []
for _ in range(runs):
start = time.time()
result = func(*args)
elapsed = time.time() - start
times.append(elapsed)
avg_time = sum(times) / len(times)
return result, avg_time
def compare_implementations(naive_func, memo_func, dp_func, test_input, label):
"""Compare three implementations of the same problem."""
print(f"\n{'='*60}")
print(f"Problem: {label}")
print(f"Input: {test_input}")
print(f"{'='*60}")
# Naive version
try:
result_naive, time_naive = benchmark(naive_func, test_input)
print(f"Naive: {time_naive:.6f}s β result = {result_naive}")
except RecursionError:
print(f"Naive: RecursionError (stack overflow)")
time_naive = float('inf')
result_naive = None
# Memoized version
if hasattr(memo_func, 'cache_clear'):
memo_func.cache_clear()
result_memo, time_memo = benchmark(memo_func, test_input)
print(f"Memoized: {time_memo:.6f}s β result = {result_memo}")
# Check cache statistics
if hasattr(memo_func, 'cache_info'):
info = memo_func.cache_info()
hit_rate = info.hits / (info.hits + info.misses) * 100 if info.misses > 0 else 0
print(f" Cache: {info.hits} hits, {info.misses} misses ({hit_rate:.1f}% hit rate)")
# Bottom-up version
result_dp, time_dp = benchmark(dp_func, test_input)
print(f"Bottom-up: {time_dp:.6f}s β result = {result_dp}")
# Verify all results match
if result_naive is not None:
assert result_naive == result_memo == result_dp, "Results don't match!"
else:
assert result_memo == result_dp, "Results don't match!"
# Calculate speedups
if time_naive != float('inf'):
speedup_memo = time_naive / time_memo
speedup_dp = time_naive / time_dp
print(f"\nSpeedup: Memoized is {speedup_memo:.0f}x faster than naive")
print(f" Bottom-up is {speedup_dp:.0f}x faster than naive")
Exercise 1: Fibonacci Comparison
Implement and compare all three versions:
# Naive version (from earlier)
def fib_naive(n):
if n <= 1:
return n
return fib_naive(n - 1) + fib_naive(n - 2)
# Memoized version
@lru_cache(maxsize=None)
def fib_memo(n):
if n <= 1:
return n
return fib_memo(n - 1) + fib_memo(n - 2)
# Bottom-up version
def fib_dp(n):
if n <= 1:
return n
prev2, prev1 = 0, 1
for _ in range(2, n + 1):
current = prev1 + prev2
prev2, prev1 = prev1, current
return prev1
# Compare
compare_implementations(fib_naive, fib_memo, fib_dp, 30, "Fibonacci(30)")
compare_implementations(fib_naive, fib_memo, fib_dp, 35, "Fibonacci(35)")
Expected Output:
============================================================
Problem: Fibonacci(30)
Input: 30
============================================================
Naive: 0.261847s β result = 832040
Memoized: 0.000018s β result = 832040
Cache: 28 hits, 31 misses (47.5% hit rate)
Bottom-up: 0.000012s β result = 832040
Speedup: Memoized is 14547x faster than naive
Bottom-up is 21821x faster than naive
============================================================
Problem: Fibonacci(35)
Input: 35
============================================================
Naive: 3.472156s β result = 9227465
Memoized: 0.000021s β result = 9227465
Cache: 33 hits, 36 misses (47.8% hit rate)
Bottom-up: 0.000014s β result = 9227465
Speedup: Memoized is 165341x faster than naive
Bottom-up is 247997x faster than naive
Exercise 2: Grid Paths
Count the number of paths from top-left to bottom-right in an mΓn grid, moving only right or down.
Your task: Implement all three versions.
# Naive version
def grid_paths_naive(m, n):
"""Count paths in mΓn grid (naive recursion)."""
# Base cases: edge of grid
if m == 1 or n == 1:
return 1
# Can arrive from top or left
return grid_paths_naive(m - 1, n) + grid_paths_naive(m, n - 1)
# Memoized version
@lru_cache(maxsize=None)
def grid_paths_memo(m, n):
"""Count paths in mΓn grid (memoized)."""
if m == 1 or n == 1:
return 1
return grid_paths_memo(m - 1, n) + grid_paths_memo(m, n - 1)
# Bottom-up version
def grid_paths_dp(m, n):
"""Count paths in mΓn grid (bottom-up DP)."""
# Create table
dp = [[0] * n for _ in range(m)]
# Base cases: first row and column
for i in range(m):
dp[i][0] = 1
for j in range(n):
dp[0][j] = 1
# Fill table
for i in range(1, m):
for j in range(1, n):
dp[i][j] = dp[i-1][j] + dp[i][j-1]
return dp[m-1][n-1]
# Compare
compare_implementations(grid_paths_naive, grid_paths_memo, grid_paths_dp, (10, 10), "Grid Paths (10Γ10)")
compare_implementations(grid_paths_naive, grid_paths_memo, grid_paths_dp, (15, 15), "Grid Paths (15Γ15)")
Expected Output:
============================================================
Problem: Grid Paths (10Γ10)
Input: (10, 10)
============================================================
Naive: 0.012847s β result = 48620
Memoized: 0.000034s β result = 48620
Cache: 45 hits, 55 misses (45.0% hit rate)
Bottom-up: 0.000028s β result = 48620
Speedup: Memoized is 378x faster than naive
Bottom-up is 459x faster than naive
============================================================
Problem: Grid Paths (15Γ15)
Input: (15, 15)
============================================================
Naive: 1.847293s β result = 40116600
Memoized: 0.000067s β result = 40116600
Cache: 105 hits, 120 misses (46.7% hit rate)
Bottom-up: 0.000051s β result = 40116600
Speedup: Memoized is 27571x faster than naive
Bottom-up is 36221x faster than naive
Exercise 3: Partition Problem
Given a set of positive integers, determine if it can be partitioned into two subsets with equal sum.
Example: [1, 5, 11, 5] β True (partition into [1, 5, 5] and [11])
Your task: Implement all three versions.
# Naive version
def can_partition_naive(nums, target, index=0):
"""Check if nums can be partitioned to sum to target (naive)."""
# Base cases
if target == 0:
return True
if target < 0 or index >= len(nums):
return False
# Try including or excluding current number
include = can_partition_naive(nums, target - nums[index], index + 1)
exclude = can_partition_naive(nums, target, index + 1)
return include or exclude
def can_partition_wrapper_naive(nums):
"""Wrapper to check if nums can be partitioned into equal subsets."""
total = sum(nums)
if total % 2 != 0:
return False
return can_partition_naive(nums, total // 2)
# Memoized version
@lru_cache(maxsize=None)
def can_partition_memo(nums, target, index=0):
"""Check if nums can be partitioned to sum to target (memoized)."""
if target == 0:
return True
if target < 0 or index >= len(nums):
return False
include = can_partition_memo(nums, target - nums[index], index + 1)
exclude = can_partition_memo(nums, target, index + 1)
return include or exclude
def can_partition_wrapper_memo(nums):
"""Wrapper for memoized version."""
total = sum(nums)
if total % 2 != 0:
return False
return can_partition_memo(tuple(nums), total // 2)
# Bottom-up version
def can_partition_dp(nums):
"""Check if nums can be partitioned into equal subsets (bottom-up DP)."""
total = sum(nums)
if total % 2 != 0:
return False
target = total // 2
# dp[i] = True if sum i is achievable
dp = [False] * (target + 1)
dp[0] = True # Sum of 0 is always achievable (empty subset)
# For each number
for num in nums:
# Update dp table (iterate backwards to avoid using same number twice)
for i in range(target, num - 1, -1):
dp[i] = dp[i] or dp[i - num]
return dp[target]
# Compare
test_cases = [
[1, 5, 11, 5],
[1, 2, 3, 5],
[1, 2, 5, 6, 8],
]
for nums in test_cases:
compare_implementations(
can_partition_wrapper_naive,
can_partition_wrapper_memo,
can_partition_dp,
nums,
f"Partition {nums}"
)
Expected Output:
============================================================
Problem: Partition [1, 5, 11, 5]
Input: [1, 5, 11, 5]
============================================================
Naive: 0.000124s β result = True
Memoized: 0.000031s β result = True
Cache: 8 hits, 16 misses (33.3% hit rate)
Bottom-up: 0.000019s β result = True
Speedup: Memoized is 4x faster than naive
Bottom-up is 7x faster than naive
============================================================
Problem: Partition [1, 2, 3, 5]
Input: [1, 2, 3, 5]
============================================================
Naive: 0.000087s β result = False
Memoized: 0.000024s β result = False
Cache: 6 hits, 14 misses (30.0% hit rate)
Bottom-up: 0.000016s β result = False
Speedup: Memoized is 4x faster than naive
Bottom-up is 5x faster than naive
============================================================
Problem: Partition [1, 2, 5, 6, 8]
Input: [1, 2, 5, 6, 8]
============================================================
Naive: 0.000234s β result = True
Memoized: 0.000041s β result = True
Cache: 12 hits, 22 misses (35.3% hit rate)
Bottom-up: 0.000027s β result = True
Speedup: Memoized is 6x faster than naive
Bottom-up is 9x faster than naive
Lab Questions
Answer these questions based on your experiments:
1. Performance Scaling
For Fibonacci, how does the speedup change as n increases from 20 to 35?
Answer: The speedup increases dramatically. For F(20), memoization is ~180x faster. For F(35), it's ~165,000x faster. This is because the naive version's time complexity is exponential O(ΟβΏ), so doubling n roughly squares the time, while memoized version remains linear O(n).
2. Cache Hit Rates
Why is the cache hit rate for Fibonacci around 47-48%, not higher?
Answer: For F(n), we compute n+1 unique values (F(0) through F(n)). Each value is computed once (cache miss), then potentially reused (cache hit). The hit rate depends on the recursion tree structure. For Fibonacci, each non-base case makes 2 recursive calls, leading to roughly equal hits and misses.
3. Space-Time Tradeoffs
Which version uses the most memory? Which is fastest?
Answer: - Most memory: Memoized version (stores all computed values + call stack) - Fastest: Usually bottom-up for dense problems (no function call overhead) - Least memory: Space-optimized bottom-up (O(1) space)
4. When Memoization Doesn't Help
Create a recursive function where memoization provides no benefit:
# Example: Recursive function with no overlapping subproblems
def sum_to_n_naive(n):
"""Sum integers from 1 to n (naive recursion)."""
if n == 0:
return 0
return n + sum_to_n_naive(n - 1)
@lru_cache(maxsize=None)
def sum_to_n_memo(n):
"""Sum integers from 1 to n (memoized)."""
if n == 0:
return 0
return n + sum_to_n_memo(n - 1)
# Compare
for n in [100, 500, 900]:
print(f"\nSum to {n}:")
_, time_naive = benchmark(sum_to_n_naive, n)
sum_to_n_memo.cache_clear()
_, time_memo = benchmark(sum_to_n_memo, n)
print(f" Naive: {time_naive:.6f}s")
print(f" Memoized: {time_memo:.6f}s")
print(f" Speedup: {time_naive / time_memo:.2f}x")
info = sum_to_n_memo.cache_info()
print(f" Cache hit rate: {info.hits / (info.hits + info.misses) * 100:.1f}%")
Expected Output:
Sum to 100:
Naive: 0.000034s
Memoized: 0.000041s
Speedup: 0.83x
Cache hit rate: 0.0%
Sum to 500:
Naive: 0.000167s
Memoized: 0.000203s
Speedup: 0.82x
Cache hit rate: 0.0%
Sum to 900:
Naive: 0.000298s
Memoized: 0.000361s
Speedup: 0.83x
Cache hit rate: 0.0%
Why no benefit: Each call to sum_to_n(k) is made exactly once (when computing sum_to_n(k+1)). There are no overlapping subproblems, so memoization adds overhead without benefit. The 0% hit rate confirms this.
Lab Summary
Key Takeaways:
- Memoization transforms exponential algorithms to polynomial (often O(2βΏ) β O(n))
- Cache hit rate indicates overlapping subproblems (high hit rate = memoization helps)
- Bottom-up DP avoids recursion limits and can be more space-efficient
- Not all recursive functions benefit from memoization (need overlapping subproblems)
- Measure before optimizing - use benchmarking to verify improvements
When to use each approach: - Naive recursion: Never for production (unless n is tiny) - Memoization: When recursion is natural and depth is manageable - Bottom-up DP: When you need all subproblems or recursion depth is a concern - Space-optimized DP: When memory is constrained and you only need final result
Chapter Summary: The Journey from Exponential to Linear
Chapter Summary: The Journey from Exponential to Linear
The Complete Journey
We started with a simple, elegant recursive implementation of Fibonacci that was completely unusable for even moderate inputs. Through systematic diagnosis and progressive refinement, we transformed it into a production-ready solution.
| Iteration | Problem | Technique Applied | Result | Complexity |
|---|---|---|---|---|
| 0 | Naive recursion | None | F(35) in 3.47s | O(ΟβΏ) time, O(n) space |
| 1 | Exponential redundancy | Manual memoization | F(35) in 0.00002s | O(n) time, O(n) space |
| 2 | Verbose cache management | @lru_cache decorator | Same performance, cleaner code | O(n) time, O(n) space |
| 3 | Recursion depth limits | Bottom-up DP | F(10000) in 0.05s | O(n) time, O(n) space |
| 4 | Memory constraints | Space-optimized DP | F(10000) in 0.05s | O(n) time, O(1) space |
Final Implementation Comparison
Top-Down Memoized (Best for: Natural recursion, sparse problems):
from functools import lru_cache
@lru_cache(maxsize=None)
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
Bottom-Up DP (Best for: Dense problems, deep recursion):
def fibonacci(n):
if n <= 1:
return n
prev2, prev1 = 0, 1
for _ in range(2, n + 1):
current = prev1 + prev2
prev2, prev1 = prev1, current
return prev1
Decision Framework: Which Approach When?
Use Top-Down Memoization when:
- Problem structure is naturally recursive (trees, divide-and-conquer)
- You might not need all subproblems (sparse computation)
- Recursion depth is manageable (n < 1000 in Python)
- Code clarity is more important than maximum performance
- You want automatic caching with @lru_cache
Use Bottom-Up DP when: - You need all subproblems anyway (dense computation) - Recursion depth might exceed limits - You want maximum performance (no function call overhead) - You can identify a clear iteration order - Memory optimization is possible (sliding window pattern)
Use Space-Optimized DP when: - Memory is constrained - You only need the final result (not intermediate values) - Problem has a "sliding window" structure (only need last k values) - You're computing very large inputs
Recognizing Memoization Opportunities
The pattern: Look for these characteristics in recursive functions:
- Overlapping subproblems: Same inputs computed multiple times
- Symptom: Exponential time complexity despite polynomial problem size
-
Test: Add call counting, check for duplicates
-
Optimal substructure: Solution built from subproblem solutions
- Symptom: Recursive case combines results from smaller inputs
-
Test: Can you express solution as function of smaller solutions?
-
Pure function: Deterministic, no side effects
- Symptom: Same inputs always produce same outputs
- Test: Function doesn't depend on external state
Common problem types that benefit: - Fibonacci-like sequences (linear recurrence relations) - Combinatorial problems (permutations, subsets, partitions) - Path counting (grids, graphs, trees) - String problems (edit distance, longest common subsequence) - Optimization problems (knapsack, coin change, scheduling)
Complexity Analysis Summary
Recurrence Relations:
Fibonacci: T(n) = T(n-1) + T(n-2) + O(1) - Naive: Solves to O(ΟβΏ) β O(1.618βΏ) - Memoized: Each subproblem computed once β O(n)
Grid Paths: T(m,n) = T(m-1,n) + T(m,n-1) + O(1) - Naive: O(2^(m+n)) - Memoized: O(m Γ n)
General Pattern: - If each subproblem branches into k subproblems: O(kβΏ) naive - With memoization: O(number of unique subproblems Γ work per subproblem)
Common Pitfalls and Solutions
Pitfall 1: Forgetting to pass memo dictionary
# Wrong - memo not passed to recursive calls
def fib(n, memo={}):
if n in memo:
return memo[n]
result = fib(n-1) + fib(n-2) # memo not passed!
memo[n] = result
return result
Solution: Use @lru_cache to avoid manual parameter passing.
Pitfall 2: Mutable default argument
# Wrong - memo={} is shared across all calls
def fib(n, memo={}):
# ...
Solution: Use memo=None and initialize inside function, or use @lru_cache.
Pitfall 3: Unhashable arguments
# Wrong - lists aren't hashable
@lru_cache(maxsize=None)
def process(items): # items is a list
# ...
Solution: Convert to tuple: items = tuple(items) or pass tuple from caller.
Pitfall 4: Caching functions with side effects
# Wrong - function has side effects
@lru_cache(maxsize=None)
def process(n):
print(f"Processing {n}") # Side effect!
return n * 2
Solution: Don't memoize functions with side effects. Cache only pure functions.
Performance Expectations
Typical speedups with memoization: - Fibonacci-like problems: 10,000x - 1,000,000x for n=30-40 - Grid path problems: 100x - 10,000x for moderate grids - Combinatorial problems: 10x - 1,000x depending on branching factor
When speedup is minimal (< 2x): - No overlapping subproblems (each call is unique) - Function is already fast (< 1ms) - Overhead of caching exceeds benefit
Lessons Learned
1. Exponential algorithms are unusable in practice - F(40) takes 38 seconds with naive recursion - Same computation takes 0.00002s with memoization - The difference between theory and practice is stark
2. Overlapping subproblems are the key indicator - If same inputs computed multiple times β memoization helps - If each input computed once β memoization adds overhead - Measure with call counting to verify
3. Python provides excellent tools
- @lru_cache makes memoization trivial
- cache_info() provides diagnostic data
- But understand the underlying technique
4. Multiple valid approaches exist - Top-down: Natural, matches problem structure - Bottom-up: Avoids recursion limits, enables space optimization - Choose based on constraints and priorities
5. Measure, don't guess - Use benchmarking to verify improvements - Check cache hit rates to validate overlapping subproblems - Profile before optimizing
Next Steps
You now understand: - β How to recognize when memoization will help - β How to implement memoization manually and with decorators - β How to convert top-down to bottom-up DP - β How to optimize space complexity - β How to measure and compare performance
In the next chapter (6.2: Tail Recursion), we'll explore another optimization technique: converting recursion to tail-recursive form, which some languages can optimize to iteration automatically. While Python doesn't perform this optimization, understanding tail recursion deepens your grasp of recursive execution and prepares you for languages that do optimize it.
Practice problems: 1. Implement memoized versions of problems from earlier chapters 2. Convert your memoized solutions to bottom-up DP 3. Measure performance improvements with benchmarking 4. Identify which problems benefit most from memoization 5. Practice recognizing overlapping subproblems in new problems